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ABSTRACT 

Military  requirements  for  voice  channel  authentication  are  cited. 
State-of-the-art  assessment  of  speech  recognition  and  vocoder  techniques 
is  related  to  voice  authentication  problems  and  requirements.  Two  in- 
house  devices  are  suggested  for  feasibility  and  system  approach  studies, 
Nonnuilitary  useful  benefits  are  also  briefly  discussed. 
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VOICE  CHANNEL  AUTHMTICATION 

A  Proposal  for  Scientific  Study  and  a  Special  Vocoder  System  Channel 
1.  INTRODUCTION 

Military  requirements  for  authentication  of  voices  transmitted  over 
special  communication  channels  are  of  a  limited  but  extremely  important 
natureo  The  requirement  for  authentication  when  verbal  orders  for  mas¬ 
sive  retaliatory  destruction  must  be  transmitted  is  easily  appreciated 
as  one  of  several  military  needs*  The  paper  principally  concerns  itself 
with  the  problem  of  exactly  identifying  a  person  who  may  be  using  such 
a  channel  and  tecnniques  will  be  described  for  dynamically  accomplishing 
such  a  "pseudo-fingerprint"  raissjon*  Related  problems  and  applications 
are  also  discussed*  The  ability  to  cope  vri.th  the  general  problem  of  authen¬ 
tication  will  be  based  on  general  state-of-art  techniques  plus  some  novel 
considerations* 

In  particular two  in-house  programs  are  suggested  for  feasibility 
and  systems  development  studies*  The  first  program  discusses  a  study 
and  development  of  speech  characterizing  patterns  which  could  be  per¬ 
formed  by  the  vocoder-computer-pattern-storage  techniques  proposed  by 
CoP.Smith^^^*  The  infornation  on  voice  pattern  statistics  and  coding, 
which  is  available  or  programmable  from  Smith’s  equipment,  is  shown 
to  be  implementable  for  authentication  studies  and  instrunentation* 

The  second  program  suggests  the  utilization  of  the  Polar  Coordinate  Con- 

(2) 

vertor  in  the  manner  shown  by  F,  Vilbig  for  studying  the  phase  con¬ 

tribution  and  invariance  of  such  phase  for  speech  polar  diagram  patterns* 


Such  studies  would  be  useful  for  ascertaining  the  feasibility  of  using 
speech  spectrum  phase  iriformation  for  distinguishing  a  particular 
voice  from  a  random  set.  The  success  of  such  studies  would  be  an  im¬ 
portant  clue  as  to  the  potentiality  of  voice  authentication  techniques, 
in  general. 

The  similarity  of  difficulties  associated  with  exactly  authenti¬ 
cating  a  voice  which  has  been  transmitted  over  a  communication  channel 
(authentication)  and  of  establishing  the  identity  of  a  stored  voice  moo 
sage,  such  as  might  appear  on  a  recorded  tape  (identification),  are  con- 
side‘”ed  from  the  viewpoint  of  the  intrinsic  speech  analysis  which  must 
be  performed,  Tm  identification  aspect  is  thereafter  assumed  part  of  the 
authentication  problem 

2,  PROBLEMS  OF  VOICE  AUTHENTICATION 

A  basic  problem  of  authentication  is  the  degree  of  reliability 
that  may  be  attained.  The  question  may  well  be  posed  "Is  it  possible  to 
identify  a  voice  detected  from  a  radio-communication  channel  with  the 
same  reliability  realized  in  fingerprint  identification."  A  more  prac¬ 
tical  aspect  of  this  problem  would  be  the  question  of  how  much  reliabil¬ 
ity  must  or  should  an  authentication  channel  have.  The  answer  to  these 
questions  are  not  known,  but  the  state-of-art  in  speech  analysis, 
especially  'with  respect  to  pattern  recognition  and  pattern  coding,  has 
led  the  author  to  believe  that  the  time  is  now  ripe  to  evaluate  these 


problemSo  He  is  further  led  to  believe  thet  '.t  least  a  good  degree 

liability  may  be  achieved  for  voice  authentication.  This  feeling  is 

partially  confirmed  by  the  limited  results  achieved  by  H,  Dudley  and 

(a) 

S.  Balashek  in  their  Autcinatic  Speech  Decogriizer  ex-parjirLcnts  ■  „  Here 
their  machine  responded  properly  only  -alien  it  v.-as  verbally  addressed  by 
a  particular  voice  (the  designer's). 

The  possible  problems  involved  in  consistantly  identifying  a  voice 
over  a  radio  link  will  be  varied.  Biological  changes  in  the  talker’s 
vocal  passages,  such  as  hoarseness,  extracted  teeth  etc.  could  play 
havoc  with  the  system,  Instiamentation  environment  and  tr-nsmission 
distortions  must  also  be  taken  into  account.  These  problem's  may  be 
taken  care  of  in  the  instrumentation  and  transmission  techniques  of 
the  special  authentication  channel,  A  problem  which  is  more  basi''  to 
the  overall  situation  and  which  cannot  be  handled  by  inst rumen ta.t  ion 
techniques  is  that  of  inherent  voice  pattern  consistency.  It  must  first 
be  shown  that  vocal  ’’fingerprint"  chai'actcri sties  do  exist  .'n>i  are 
maintained  under  varying  conditions.  It  is  thus  concluded  that  a  r-’o 
gram  of  scientific  studies  and  special  system  instrumentation  must  be 
initiated  to  gain  a  better  insight  to  these  problems.  The  need  for 
a  broader  approach  to  speech  recognition  (which  is  gerraaine  to  authen¬ 
tication)  was  voiced  by  E,E.David^^^  at  the  recent  AFCRC  seminar  on 
speech  compression.  This  is  indeed  true  in  the  special i-zed  ar«a  of 
authentication  since  prior  studies  in  speech  recognition  have  been 
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oriented  toward  common  (invariant)  attributes  of  voice  signals  rather 
than  particular  (personal)  attributes.  Studies  of  optimum  criteria  for 
authentication  are  seriously  lacking.  Such  studies  could  point  to 
identification  characteristics  other  than  those  present  in  a  speech  spec¬ 
trum  pattern.  Therefore  a  concerted  effort,  including  those  related 
scientific  disciplines  which  might  profitably  contribute,  is  necessary 
to  evaluate  the  overall  problem,  eliminate  the  many  "garbage”  techniques 
and  assure  a  near  future  solution. 

3,  VOCODER  APPROACH 

The  state*-of-art  of  vocoders  was  well  described  at  the  recent  AFCRC 
seminar  on  speech  compression  (session  B  of  the  seminar  proceedings).  Per¬ 
formance  data  would  indicate  that  digitized  vocoders  could  be  profit¬ 
ably  used  for  seciore  transmission  of  information.  Here  the  voice  in¬ 
formation  would  be  digitally  coded,  and  the  interposition  of  a  special 
coding  b]ack  box  could  give  privacy  to  a  particular  channel.  A  serious 
disadvantage  of  using  such  a  vocoder  over  a  sensitive  military  com¬ 
mand  channel  (requiring  authentication)  is  that  the  synthesized  voice 
at  the  receiver  terminal  could  never  recapture  the  exact  original  pitch 
of  the  sender,  allowing  limited  but  not  authentic  recognition  of 


the  received  synthesized  voice.  This  situation  could  allow  transmission 
of  vital  verbal  commands  which  might  not  be  "authentic".  Authentication 
procedxires  must  prevent  such  situations  from  arising. 


It  might  be  profitable  at  ’his  point  to  reflect  cn  t;',e  difference 
between  listener  voire  recognition  and  receiver  ‘;-oice  authsnticationo  Re  - 
cognition  here  implies  that  the  -roice  -’sounds  like”  a  particular  person,, 
where  apriori  infomation  cf  the  sender  s  voice  is  required  by  the  person 
receiving  the  messages  Aithent  iratrci)  requires  •’hat  the  sender  's  voice 
IS  positively  identified  and  that  this  type  cf  identification  may  or  may 
not  require  the  person  receiving  ’'he  message  to  be  familiar  with  the 
sender's  voice*  The  authen* i cation  ’■"chnique  could  be  tc  store  pat¬ 
tern  information  regarding  the  particular  sender  s  voice  at  the  re-- 
ceiving  terminals  Here  it  would  compare  the  incoming  digitized  voice 
patterns  with  prestored  information  in  order  to  make  an  authentication 
decision*  In  the  case  cf  marginal  authent* cation  decisions,  repeat 
transmissions  or  special  questions  might  be  requested  from  the  sender 
to  improve  the  correlation  of  authenticityo 

The  vocoder  =;ystem  rT'orerties  which  best  lend  themsel'ves  to  authen - 
ticaticr  arc  the  pitch  channel  parameters  and  the  time 

multipl  exed  a..i.>l It  ude -quant j  zed  spectrum  patterns  which  quasi- contin¬ 
uously  cnaracterize  th“  speech  spectral  energy,  fompvter  programming 
procedures  would  bes'’  be  suited  for  analyzing  such  parameters  or  pat¬ 
tern  streams  with  respect  to  a  par'’ icular  talker  and  infomation  com¬ 
parison  and  storage  techniques  utilized  •’.o  assign  an  authentication 
weighting  to  a  channel  user’s  voice  patterns*  Techniques  for  deemphasii 
of  invariant  patterns  and  enhancement  cf  sirg-ular  events  is  necessary.-. 


v/i 
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An  in-housp  device  for  authentication  studies  and  ins tmTient ati on 

( o) 

development  vvull  now  be  discussed.  The  aprroach  by  C.P. Smith'*' 
statistically  encoding  and  decoding  speech  in  real  tirrie  for  analyzing 

the  acoustic  domain  of  speech  (using  a  digitized  vocoder  in  '■o.-.iunct  \  or. 
with  data  processing  ecuir.ment)  could  be  modified  for  authenti  c'  ticn 
purposes.  His  system,  in  part,  can  encode  speech  spectmum  patterns  from 
an  18  channel  vocoder  to  gain  statistical  information  about  tne  in¬ 
variant  and  transitional  speech  patterns  which  are  applic.able  to  a  large 
population  of  talkers.  The  data  is  programmed  into  a  special  purpose  co.m- 
puter,  in  real  time,  and  automatically  acquires  a  statistics!  descrip¬ 
tion  of  the  time-multiplexed  frequency  amplitude  patterns  of  large 
speech  samples.  It  should  be  possible  to  use  similar  techniciues  for 
authentication  requirements.  As  against  normal  use  of  the  syste.m,  the 
low  cori'clation  of  certain  phonetic  patterns  could  now  be  used  to  en¬ 
hance  authentication  of  a  particular  voice.  The  single  talker  case 
would  thus  require  less  speech  patterns  than  the  normal  use  of  digitally 
coded  speech  transmission.  This  could  mean  that  an  authentication 
channel  might  be  made  higher  fidelity  than  a  normal  channel  assuming 
equal  transmission  tcrziinal  equipment  -^nd  bandwidth.  This  is  entirely 
within  keeping  of  the  authentication  motif,  namely  highly  recognizable 
synthesized  speech.  It  could  well  turn  out  from  authenti Crition  in  ¬ 
vestigations  that  optimum  digitized  speech  transmission  and  synthesis 
methods,  for  liirdted  cases,  would  be  to  use  authentication  techniques. 


1 


The  irapro  venient  of  iiritized  vcc'd^^r  orerntion  or  bondwidtn  re¬ 
duction  aspects  should  not  be  *ried  before  a  more  bus.:c  consideration 
has  been  studied,  Tnis  is  the  ■’.■uestion  of  voice  di  ^crin:  not  ion  against 
non-authentic  charuiel  userso  It  must  first  be  shovn,  for  instance,  that 
a  competent  ventriloquist  could  net  fool  the  uthenti c-at :..ng  system.  To 
gain  soine  special  information  about  the  ability  of  a  voice  sp®c.rum  to 
remain  critically  invariant  for  authentication  purposes,  another  in- 
house  study-approach  is  suggested  in  the  next  section, 
k,  SPLiECH  PHASE  PATTERN  STUDIES 

A  speech  pattern  is  not  only  representative  of  its  defining  sound 
but  also  includes  special  characteristics  of  the  person  uttering  the  so'und. 
Up  to  now  only  vocoder  tjnie  patterns  have  been  discussed,  such  patterns 
typifying  t'ne  an.iiog  portrayal  of  the  speech  spectrum  power  density.  These 
patterns  are  accordingly  used  to  synthesize  a  talker  s  voice  at  a  re- 
mote  tenr.inal.  It  w'-nld  be  •useful,  for  auth^ntccati  op  rurroses,  to 
examine  other  cnaracteristi cs  of  the  speech  wave  such  as  its  component 
phase  attributes,  Ther*^  iSj  in  house,  a  device  culled  the  Polar 
Coordinate  Convertor  which  allows  the  exai.iination  of  complex  spectra 

in  po'ar  pattern  fens.  It  turn?  out  that  wnen  speech  is  ex;uai:;ed  by 
( ? ) 

this  device  the  resulting  pattern  :5  hipiily  sensitive  to  the  phase 

analysis  of  complex  speech  sounds. 

The  Polar  Ccordi ra* e  Convertor  is  essentially  a  device  which  por¬ 
tray*  coaplex  spectr'ims  or  an  oscilloscope  in  polar  form.  It  was  desipined 
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primarily  to  li.easure  the  phase  and  amplitude  distor'^^on  A-riich  m:pr.t  'i: 
tain  '/vTthin  a  modul'<ted  cirrier  after  it  had  been  propagated  t'nrourh 
a  non  linear  spare.  It  provides  means  for  photographing  the  spectrum 
patterns  in  polar  or  rectified  Cartesian  form.  The  Cartesian  patterns 
may  be  photographed  on  35™  film  continuously  or  in  a  time-multi  plexed 
rolumn  arr^y.  The  multiplex  techniques  allows  for  compact  asr, es.or f'nt 
and  rapid  evaluation  of  massive  amounts  of  pattern  data  such  as  would  be 
necessary  for  a  thorough  speecn  analysis. 

Since  speech  spectral  components  are  essentially  harmonic  in  nature 
during  vowel  utterences,  the  polar  mode  of  the  device  realizes  an  almost 
infinite  variety  of  interlooped  patterns  for  a  given  set  of  sounds,  Fcr 
one  given  sounds  a  different  pattern  for  various  talker’s  uttering  the 
sound  may  be  realized.  The  numerous  patterns  amongst  talkers^  for  a 
given  sound,  are  due  to  the  varied  phase  and  frequency  of  the  spectrum 
components.  This  may  be  easier  understood  by  practically  consideriTig 
the  hximan  differences  in  vocal  pitch,  teeth  positions,  tongue  sizes, 
oral  cavity  resonances  and  muscular  actions  which  affect  pitch  over¬ 
tones,  It  is  not  presently  known  how  well  phase  patterns  may  charac¬ 
terize  or  discriminate  a  particular  voice.  Scientific  information  is  very 
meager  on  this  point  since,  classically,  phase  information  of  speech 
spectrum  components  has  been  neglected  since  it  has  been  shown  to  con¬ 
tribute  practically  nothing  to  speech  understandability  (first  pointed 
out  by  Helmhotz  in  1877).  The  availability  and  use  of  the  Polar  Coor¬ 
dinate  Ccnvertor  should  facilitate  getting  such  phase  information  and 


evaluate  its  utility  for  authentication  purposeso 
5.  OTHER  APPROACHES 

Phase  and  soeotrum  are  net  necessarily  the  only  or  best  parameters 
for  speech-identifying  pattern- coding  purposes^  Some  desirable  para¬ 
meters  are  cumbersone  to  handle  or  difficult  to  ext,racto  From  the  point 
of  view  of  problem  effioiency_„  a  more  sophisticated  method  which  would 
be  naturally  suited  for  auihenti cation  is  that  which  recognizes  impor¬ 
tant  phonemic  or  certain  other  linguistic  features  of  a  particular 

(A) 

voicco  Methods  such  as  this  ’  are  still  plagued  by  problems  in 
phoneme  classification  r.id  resynthesis  for  nomal  speech  synthesis  re¬ 
quirements*  It  is  possible  that  by  combining  some  important  phonemic 
features  in  parallel  with  special  spectrum  pattern  information  ^  '  that 
the  ideal  pattern  classification  method  for  authentication  may  be 
attained*  It  is  very  likely  that  such  a  synthesis  method  would  minimize 
its  problem  plague ^  and  could  be  made  very  successful  for  synthesis 
purposes  of  the  single  talker  case  encounteied  in  an  authentication 
channel*  Other  aids  for  authentication  would  be  tne  computer  speech 
programming  studies  currently  under  investigation  at  Bell  Telephone 
Labs*  and  MoI*T*  (see  also  papers  in  AFCRC  proceedings  of  symposium  on 
speech  compression  and  processing  AFCRC  TR-59‘-198,  Vols  I  and  II)* 

The  requirements  for  authentication  are  not  necessarily  limited 
to  military  usage  and  for  such  non-military  cases,  depending  on  the  de¬ 
gree  of  authentication  required.,  other  novel  approaciies  might  prove  of 
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O'f  greater  utility.  For  lustancej  one  raiyht  xase  patterns  -.rhich  in¬ 
cluded  dynamic  phyaiological  or  biological  infonnation  of  the  channel 
user.  Such  pattern  features  might  be  derived  from  lie  detector  equip¬ 
ment  responses^  physiological  responses  or  other  characterizing  features 
that  could  be  selected  more  desirably  by  medical  rather  than  electronic 
scientists.  Speech  researchers  could  look  further  into  such  abstract 
defining  parameters  as  voice  vibrato^  breathing,  explosiveness  of  utter- 
ences  etc.  Linguists,  statisticians  and  speech  therapeutists  may 
well  provide  interesting  and  valuable  cues  to  this  special  aspect  of 
aut  he  nt  i  c  a  t  i  on , 

6,  OPERATIONAL  USES 

A  voice  authenticating  channel  may  be  utilized  in  several  ways  for 
military  requirements,  A  prime  requisite  would  be  to  authenticate  high- 

0 

ly  sensitive  command  channels  that  might  be  tied  into  a  global  commun¬ 
ications  network.  Here  the  requirement  for  preventing  enemy  encroach¬ 
ment  into  the  channel  for  transmitting  non-authentic  commands  is  easily 
met.  The  authenticating  channel  would  easily  sense  such  foreign  voices. 

Normally  the  authenticating  channel  would  be  one  of  a  multi-channel 
carrier  system.  Should  the  situation  require,  it  would  be  possible  to 
make  the  authentication  channel  transmit  nonsense  information  in  the 
case  of  a  non-authorized  user.  As  a  countermeasure,  it  could  be  used 
as  an  additional  constraint  on  a  coded  channel.  This  might  be  done  in 
several  ways.  Noteworthy  would  be  to  invert  the  patterns  in  such  a 

/ 
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manner  that  there  would  be  no  correlation  between  regular  channel  pat¬ 
terns  and  those  of  the  authentication  channel. 

'•Fingerprint'*  identification  of  voices  could  open  up  a  whole  new 
field  of  application  of  military  requirements  for  recognition  or  iden¬ 
tification.  The  combat  problem  of  airrlane  identification  (IFF)  could 
be  augmented  by  propeller  or  jet  signal  identifiers.  This  would  be  for 
situations  where  IFF  was  inoperative  but  radio  contact  or  acoustic  sig¬ 
nal  detection  was  possible.  Unfamiliar  voices  in  such  air-to-ground 
transmissions  could  be  recognized  by  comparison  with  centrally  pre¬ 
stored  information  about  the  pilot  in  question.  The  vAiole  idea  of  se¬ 
cure  voice  transmission  might  have  to  be  reevaluated  once  "fingerprint" 
identification  of  voices  became  a  practical  matter. 

New  military  operational  requirements  for  authentication  and  iden¬ 
tification  m.ay  appear  when  manned  space  fli^t  becomes  a  reality.  It 
is  difficult  to  predict  vdiat  exigencies  will  prevail  at  such  a  time. 

For  instance,  an  unlikely,  but  nevertheless  possible  situation  might 
be  the  case  of  voice  communication  from  an  environment  vhose  composite 
air  density  might  be  different  from  that  on  earth.  This  could  cause 
the  pitch  of  the  talker  to  change  sufficiently  so  that  his  voice  might 
not  be  recognizable.  By  using  techniques  acquired  for  voice  authen¬ 
tication,  it  should  be  possible  to  synthesize  a  recognizable  voice 
from  its  distorted  pattern  set.  The  general  nroblem  of  transmuting 
voices  is  intriguing  and  might  find  other  military  usefullness.  This 


would  be  in  the  realm  of  countermeasures  where  talker  concealment  ra¬ 


ther  than  authentication  would  be  involved. 

Non  military  applications  should  also  be  considered  for  the  useful 
benefits  that  might  be  derived.  For  instance^  identification  of  taped 
voices  used  as  evidence  is  courts  of  law  could  now  be  made  certain  by 
analyzing  the  taped  voice  with  that  of  the  person  in  question.  Many 
other  commercial  applications  will  certainly  arise  once  the  basic  tech¬ 
niques  has  found  continued  successful!  application, 

7,  SUMMARY 

Vital  military  needs  and  uses  for  voice  channel  authentication 
have  been  discussed.  It  is  indicated^  frcra  state-of-art-  considerations., 
that  voice  authentication  is  possible,  but  that  the  degree  of  authen¬ 
tication  attainable  must  be  investigated.  Two  in-house  programs  have 
been  suggested  for  feasibility  and  system  appraoch  studies.  Non  mili¬ 
tary  useful  benefits  of  authentication  procedures  have  also  been  dis¬ 


cussed 
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